Search CORE

16 research outputs found

Analizador de posiciones del tablero del Go

Author: Huertas Tato Javier
Publication venue
Publication date: 01/10/2014
Field of study

Throughout humankind's history games have been a defining part of it. For humans, games are a unique phenomenon. They are a challenge established within a defined set of abstract rules, an example of what humans desire: overcoming obstacles and going forward. These challenges are not required for survival, solving and mastering them are its own rewards. A great amount of games have been created during our history. Games are extremely varied, difficult, sophisticated, simple. The world nowadays has a lot of games to offer and one of its main categories is tabletop games. Again, most of them have been created and popularized throughout history. One of the better known games is Go. Go's main feature is being fairly simple in terms of rules which can be explained within a short time. However, it may take a whole life for a normal person to master these rules to their maximum extent. This makes it very interesting and peculiar because it is one of the most difficult games ever created. It is a very complex game given its age, and after 2000 years it is still being studied. In chess, another tabletop game, automatic players are able to play at a grandmaster level by using techniques based on the exhaustive exploration of possible moves, like min-max search with alpha-beta pruning, with the help of carefully designed evaluation functions. This approach is significantly less useful with Go. Go has too many possibilities in terms of movements so the approach taken in Chess gives far less advantage with Go. Go is a complex game for humans to play, but it is even harder to play for computers. Despite its age there are ongoing investigations about Go's computer analysis. No optimal strategy has been found yet for Go. It is one of the games that has not been solved yet and it is considered one of the most difficult to handle. Any effort made in this area is important because of its complexity. Any valuable addition will push further investigation forward. The motivation of this project is to improve our knowledge of the computational analysis of Go. Following the reasoning made before, a Go tool will be developed. This tool will be able to analyze any given Go board using an algorithm known as Monte-Carlo Tree Search. Some playing agents used in Go competitions use this algorithm. However these agents use MCTS only to play, so this project will take a different take on this algorithm. MCTS will be used to retrieve information about influence and other different features of a Go board. Furthermore, the tool developed will be used in order to further analyze information retrieved with machine learning techniques.Ingeniería Informátic

Universidad Carlos III de Madrid e-Archivo

PART: Pre-trained Authorship Representation Transformer

Author: Camacho David
Huertas-Garcia Alvaro
Huertas-Tato Javier
Martin Alejandro
Publication venue
Publication date: 30/09/2022
Field of study

Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Finding these details is very relevant to profile authors, relating back to their gender, occupation, age, and so on. But most importantly, repeating writing patterns can help attributing authorship to a text. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. A better approach to this task is to learn stylometric representations, but this by itself is an open research challenge. In this paper, we propose PART: a contrastively trained model fit to learn \textbf{authorship embeddings} instead of semantics. By comparing pairs of documents written by the same author, we are able to determine the proprietary of a text by evaluating the cosine similarity of the evaluated documents, a zero-shot generalization to authorship identification. To this end, a pre-trained Transformer with an LSTM head is trained with the contrastive training method. We train our model on a diverse set of authors, from literature, anonymous blog posters and corporate emails; a heterogeneous set with distinct and identifiable writing styles. The model is evaluated on these datasets, achieving zero-shot 72.39\% and 86.73\% accuracy and top-5 accuracy respectively on the joint evaluation dataset when determining authorship from a set of 250 different authors. We qualitatively assess the representations with different data visualizations on the available datasets, profiling features such as book types, gender, age, or occupation of the author

arXiv.org e-Print Archive

Improving prediction intervals using measured solar power with a multi-objective approach

Author: Aler Ricardo
Galván Inés M.
Huertas Tato Javier
Valls José M.
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Prediction Intervals are pairs of lower and upper bounds on point forecasts and are useful to take into account the uncertainty on predictions. This article studies the influence of using measured solar power, available at prediction time, on the quality of prediction intervals. While previous studies have suggested that using measured variables can improve point forecasts, not much research has been done on the usefulness of that additional information, so that prediction intervals with less uncertainty can be obtained. With this aim, a multi-objective particle swarm optimization method was used to train neural networks whose outputs are the interval bounds. The inputs to the network used measured solar power in addition to hourly meteorological forecasts. This study was carried out on data from three different locations and for five forecast horizons, from 1 to 5 h. The results were compared with two benchmark methods (quantile regression and quantile regression forests). The Wilcoxon test was used to assess statistical significance. The results show that using measured power reduces the uncertainty associated to the prediction intervals, but mainly for the short forecasting horizonsThis work was funded by the Spanish Ministry of Science under contract ENE2014-56126-C2-2-R (AOPRIN-SOL project)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Deep learning for understanding multilabel imbalanced Chest X-ray datasets

Author: Camacho David
Del Ser Lorente Javier
Huertas Tato Javier
Liz Helena
Sánchez Montañés Manuel
Publication venue: 'Elsevier BV'
Publication date: 01/07/2023
Field of study

Over the last few years, convolutional neural networks (CNNs) have dominated the field of computer vision thanks to their ability to extract features and their outstanding performance in classification problems, for example in the automatic analysis of X-rays. Unfortunately, these neural networks are considered black-box algorithms, i.e. it is impossible to understand how the algorithm has achieved the final result. To apply these algorithms in different fields and test how the methodology works, we need to use eXplainable AI techniques. Most of the work in the medical field focuses on binary or multiclass classification problems. However, in many real-life situations, such as chest X-rays, radiological signs of different diseases can appear at the same time. This gives rise to what is known as ”multilabel classification problems”. A disadvantage of these tasks is class imbalance, i.e. different labels do not have the same number of samples. The main contribution of this paper is a Deep Learning methodology for imbalanced, multilabel chest X-ray datasets. It establishes a baseline for the currently underutilised PadChest dataset and a new eXplainable AI technique based on heatmaps. This technique also includes probabilities and inter-model matching. The results of our system are promising, especially considering the number of labels used. Furthermore, the heatmaps match the expected areas, i.e. they mark the areas that an expert would use to make a decision.This work has been funded by Grant PLEC2021-007681 (XAI-DisInfodemics) and PID2020-117263GB-100 (FightDIS) funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union NextGenerationEU/PRTR”, by the research project CIVIC: Intelligent characterisation of the veracity of the information related to COVID-19, granted by BBVA FOUNDATION GRANTS FOR SCIENTIFIC RESEARCH TEAMS SARS-CoV-2 and COVID-19, by European Comission under IBERIFIER - Iberian Digital Media Research and Fact-Checking Hub (2020-EU-IA-0252), by “Convenio Plurianual with the Universidad Politécnica de Madrid in the actuation line of Programa de Excelencia para el Profesorado Universitario”, and by Comunidad Autónoma de Madrid under S2018/TCS-4566 (CYNAMON) grant. M. Sánchez-Montañés has been supported by grants PID2021-127946OB-I00 and PID2021-122347NB-I00 (funded by MCIN/AEI/ 10.13039/501100011033 and ERDF - “A way of making Europe”) and Comunidad Autónoma de Madrid, Spain (S2017/BMD-3688 MULTI-TARGET&VIEW-CM grant). J. Del Ser thanks the financial support of the Spanish Centro para el Desarrollo Tecnológico Industrial (CDTI, Ministry of Science and Innovation) through the “Red Cervera” Programme (AI4ES project), as well as the support of the Basque Government (consolidated research group MATHMODE, ref. IT1456-22

Archivo Digital para la Docencia y la Investigación

Evolutionary-based prediction interval estimation by blending solar radiation forecasting models using meteorological weather types

Author: Aler Ricardo
Arbizu Barrena Clara
Galván Inés M.
Huertas Tato Javier
Pozo Vázquez David
Rodríguez Benítez Francisco J.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

Recent research has shown that the integration or blending of different forecasting models is able to improve the predictions of solar radiation. However, most works perform model blending to improve point forecasts, but the integration of forecasting models to improve probabilistic forecasting has not received much attention. In this work the estimation of prediction intervals for the integration of four Global Horizontal Irradiance (GHI) forecasting models (Smart Persistence, WRF-solar, CIADcast, and Satellite) is addressed. Several short-term forecasting horizons, up to one hour ahead, have been analyzed. Within this context, one of the aims of the article is to study whether knowledge about the synoptic weather conditions, which are related to the stability of weather, might help to reduce the uncertainty represented by prediction intervals. In order to deal with this issue, information about which weather type is present at the time of prediction, has been used by the blending model. Four weather types have been considered. A multi-objective variant of the Lower Upper Bound Estimation approach has been used in this work for prediction interval estimation and compared with two baseline methods: Quantile Regression (QR) and Gradient Boosting (GBR). An exhaustive experimental validation has been carried out, using data registered at Seville in the Southern Iberian Peninsula. Results show that, in general, using weather type information reduces uncertainty of prediction intervals, according to all performance metrics used. More specifically, and with respect to one of the metrics (the ratio between interval coverage and width), for high-coverage (0.90, 0.95) prediction intervals, using weather type enhances the ratio of the multi-objective approach by 2%¿. Also, comparing the multi-objective approach versus the two baselines for high-coverage intervals, the improvement is 11%¿% over QR and 10%¿% over GBR. Improvements for low-coverage intervals (0.85) are smaller.The authors are supported by projects funded by Agencia Estatal de Investigación, Spain (PID2019-107455RB-C21 and PID2019-107455RB-C22/AEI/10.13039/501100011033). Also supported by Spanish Ministry of Economy and Competitiveness, project ENE2014-56126-C2-1-R and ENE2014-56126-C2-2-R (http://prosol.uc3m.es). The University of Jaén team is also supported by FEDER, Spain funds and by the Junta de Andalucía, Spain (Research group TEP-220

Universidad Carlos III de Madrid e-Archivo

Machine Leaning methods for solar irradiance forecast blending and estimation

Author: Huertas Tato Javier
Publication venue
Publication date: 13/11/2019
Field of study

Mención Internacional en el título de doctorRenewable energies are the leading alternative to fossil fuels, facing the constant threat of climate change. The development of these new resources has grown in the latest years, especially in the field of solar and wind energy. These renewable power sources have gathered a series of research challenges that, to this date, are still to be solved, with many contributions to this end in the last decade. The role of estimation and forecasting of solar energy is key to the development of the solar energy market, because it cheapens instrumentation costs and improve the efficiency of solar energy market participation in the power grid. The forecast of solar energy is fundamental to estimate costs and operational regulations of a solar plant, although the intermittence of solar energy makes this a difficult task. On the other hand, the estimation of solar irradiance can replace expensive measuring devices such as pyranometers or pyrheliometers; or the need of expert supervision on meteorological stations for cloud type classification. In order to improve estimation, two proposals are studied. The first approach to estimation is the automatic classification of clouds by including ceilometer information. This is a device capable of measuring height and thickness of a cloud, information that has never been applied to cloud classification. The next proposal is the estimation of irradiance by directly analyzing images with Convolutional Networks and multiple perspectives, a never before used technique for solar energy estimation. To improve forecasting the integration of prediction models is proposed. This technique compares and combines existing predictive models to obtain a final, more accurate, prediction. Although this is not a new approach, it has never been applied to various prediction models specialized in different horizons, or for short-term forecasting. Given that clouds produce the greatest interference between extraterrestial and surface irradiance, whole-sky cloud images are a valuable source of data for radiation estimation. To study the cloud type classification problem a Random Forest algorithm is employed. The algorithm is trained using information from cloud height and thickness, which is combined with camera im- 3 age features. Including cloud height and width proves to noticeably improve accuracy even when difficult to classify cloud types are included. Results for 10-class cloud classification, including multiple clouds in a single image, show 71.12%, an improvement over the 50.6% achieved without ceilometer information. This study shows the positive impact of ceilometer information in the cloud classification problem. Irradiance estimation can also be estimated directly from camera images. To face this problem various models have been created using convolutional neural networks, a Machine Learning technique fit for image recognition. Two approaches are proposed, a model with information from a single camera and a model with multiple sky perspectives. In addition to the common RGB colour channels used in image processing, two new channels are included: the distance from a pixel to the sun and the cloudy pixels of an image. Multiple perspectives improve noticeably all alternatives proposed, proving the contribution of the multi-view convolutional network proposed. There are many predictive models that predict with diverse capabilities at different predictive horizons. In this thesis, this process is called forecast integration (or blending). An integration model is proposed to blend four physical models from four meteorological stations at the south of the Iberian peninsula. Using support vector regression these are combined in a linear and non-linear way using the four predictors as inputs to machine learning. Two approaches are presented: a horizon approach that builds a model for each predictive horizon, and a general approach that builds a single prediction model for all horizons. In addition, a regional model is proposed, capable of of making predictions at a regional level instead of a station level. Results from integration are very positive compared with the baseline models for global and direct irradiance. Some absolute improvements reach 15% when comparing integration models to any predictor model when rRMSE and rMAE are evaluated on global and direct irradiance. At a regional level, there are also improvements, at an absolute 5% on global radiation over the predictor models and 10% for direct irradiance. The general approach is specially remarkable because, using a single model, it can obtain the best results on rMAE and match the results of other integration models on rRMSE.Las energías renovables son una importante alternativa a los combustibles fósiles ante el constante avance del cambio climático. El desarrollo de estos nuevos recursos se ha acelerado en los últimos años, especialmente en el campo de energía eólica y solar. Estas fuentes energéticas han atraído una serie de desafíos de investigación que siguen en progreso de ser resueltos, con numerosas contribuciones en la última década. La labor de estimación y predicción de energía solar es integral para el desarrollo del mercado energético, ya que permite abaratar costes instrumentales y mejorar la eficiencia de la penetración de la energía solar en la mezcla energética. La predicción de energía es fundamental en el mercado energético para estimar costes y regulaciones operativas de plantas solares, aunque la intermitencia de la energía solar hace que sea una tarea difícil. Por otro lado, la estimación de radiación solar permite reemplazar herramientas de alto coste como piranómetros y pirheliómetros; o la necesidad de expertos para detectar tipos de nube. Para la mejora de estimación se estudian dos propuestas diferentes. En primer lugar se trata de abordar el problema de clasificación de nubes, incluyendo información de ceilómetro. Esta es una herramienta que mide altura y anchura de una nube, cuyo uso nunca ha sido aplicado en la clasificación de nubes. La siguiente propuesta es la estimación de radiación directa a partir de imágenes, usando Redes Convolucionales y múltiples perspectivas, una técnica que nunca ha sido empleada para la estimación de energía solar. Para la mejora de la predicción de energía solar se propone la integración de modelos predictivos. Esta técnica consiste en la combinación de modelos predictivos existentes para obtener una predicción final mucho más precisa que las iniciales. Aunque esta no es una aproximación nueva, su exploración ha sido insuficiente para varios modelos especializados en distintos horizontes, o para predicción a corto plazo. Dado que las nubes producen el mayor impacto entre la radiación extraterrestre y la radiación que alcanza la superficie, las imágenes de nubes son una fuente de datos valiosa para la estimación de radiación. Para estudiar la clasificación del tipo de nube se emplea un algoritmo Random Forest entrenado con información sobre la altura y ancho de la nube, que se combina con estadísticos obtenidos a partir de imágenes. La información del ceilómetro permite mejorar notablemente los resultados incluso cuando se incluyen ejemplos de nube difíciles para expertos. Se logra predecir 10 tipos de nube con un 71.1% de precisión frente al 50.6% obtenido sin ceilómetro. Este estudio prueba que la inclusión de información del ceilómetro tiene un impacto muy positivo en los resultados. La estimación de radiación también se puede afrontar directamente a partir de las imágenes. Para tratar este problema se han creado varios modelos usando redes convolucionales apropiadas para el análisis de imágenes. Se proponen modelos que utilizan información proveniente de una sola cámara y otro modelo con múltiples perspectivas del cielo. Además de los canales habituales utilizados en el proceso de imágenes con redes convolucionales (RGB) se incluyen varios canales adicionales: la lejanía de los píxeles al sol y los píxeles que representan nubes. Las múltiples perspectivas y canales de información adicionales mejoran notablemente las alternativas propuestas, demostrando el aporte de la red convolucional multi-perspectiva propuesta. Existen multitud de modelos predictivos que ofrecen predicciones con capacidades diversas a distintos horizontes de predicción. En esta tesis, se propone un modelo integrador de cuatro modelos predictivos. Usando Maquinas de Vectores de Soporte para regresión se combinan de manera lineal y nolineal los cuatro predictores, utilizando como entradas al modelo las predicciones de los cuatro predictores. Se proponen dos aproximaciones, una por horizontes, construyendo un modelo para cada horizonte de predicción, y otra general, construyendo un modelo único para todos los horizontes. Los modelos han sido evaluados con datos procedentes de cuatro localizaciones al sur de la península ibérica. También se propone un modelo integrador regional, capaz de aportar predicciones a nivel regional en lugar de a nivel de estación. Los resultados de integración son muy positivos tanto para radiación global como directa, mostrando mejoras absolutas hasta del 15% frente a cualquier predictor tanto en rRMSE como en rMAE. A nivel regional también se obtienen mejoras del 5% para radiación global y del 10% para radiación directa. La aproximación general es especialmente destacable, haciendo uso de un único modelo, es capaz de obtener los mejores resultados en rMAE e igualar al resto de modelos de integración en rRMSE.This dissertation has been developed under the project PROSOL ENE2014-56126-C2 (Towards an integrated model for solar energy forecasting) in collaboration with the research group MATRAS (University of Jaen) and funded by the Ministry of Science and Innovation (Spanish Government). All the data shown in this text has been provided by MATRAS and has been used with their permission.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Pedro Isasi Viñuela.- Secretario: Esteban García Cuesta.- Vocal: Ricardo Simón Carbaj

Universidad Carlos III de Madrid e-Archivo

Using Smart Persistence and Random Forests to Predict Photovoltaic Energy Production

Author: Javier Huertas Tato
Miguel Centeno Brito
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

Solar energy forecasting is an active research problem and a key issue to increase the competitiveness of solar power plants in the energy market. However, using meteorological, production, or irradiance data from the past is not enough to produce accurate forecasts. This article aims to integrate a prediction algorithm (Smart Persistence), irradiance, and past production data, using a state-of-the-art machine learning technique (Random Forests). Three years of data from six solar PV modules at Faro (Portugal) are analyzed. A set of features that combines past data, predictions, averages, and variances is proposed for training and validation. The experimental results show that using Smart Persistence as a Machine Learning input greatly improves the accuracy of short-term forecasts, achieving an NRMSE of 0.25 on the best panels at short horizons and 0.33 on a 6 h horizon

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

BERTuit: Understanding Spanish language in Twitter through a native transformer

Author: Camacho David
Huertas-Tato Javier
Martin Alejandro
Publication venue
Publication date: 13/06/2022
Field of study

The appearance of complex attention-based language models such as BERT, Roberta or GPT-3 has allowed to address highly complex tasks in a plethora of scenarios. However, when applied to specific domains, these models encounter considerable difficulties. This is the case of Social Networks such as Twitter, an ever-changing stream of information written with informal and complex language, where each message requires careful evaluation to be understood even by humans given the important role that context plays. Addressing tasks in this domain through Natural Language Processing involves severe challenges. When powerful state-of-the-art multilingual language models are applied to this scenario, language specific nuances use to get lost in translation. To face these challenges we present \textbf{BERTuit}, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets using RoBERTa optimization. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network, with special emphasis on solutions devoted to tackle the spreading of misinformation in this platform. BERTuit is evaluated on several tasks and compared against M-BERT, XLM-RoBERTa and XLM-T, very competitive multilingual transformers. The utility of our approach is shown with applications, in this case: a zero-shot methodology to visualize groups of hoaxes and profiling authors spreading disinformation. Misinformation spreads wildly on platforms such as Twitter in languages other than English, meaning performance of transformers may suffer when transferred outside English speaking communities.Comment: Support: 1) BBVA FOUNDATION - CIVIC, 2) Spanish Ministry of Science and Innovation - FightDIS (PID2020-117263GB-100) and XAI-Disinfodemics (PLEC2021-007681), 3) Comunidad Autonoma de Madrid - S2018/TCS-4566, 4) European Comission - IBERIFIER (2020-EU-IA-0252), 5) Digital Future Society (Mobile World Capital Barcelona) - DisTrack, 6) UPM - Programa de Excelencia para el Profesorado Universitari

arXiv.org e-Print Archive